Day 25 NLP 中的依存句法分析（Dependency Parsing）介紹與實作

2022 iThome 鐵人賽

DAY 25

AI & Data

語言學與NLP系列第 25 篇

14th鐵人賽 #python #dependency tree #nlp #natural language process

cjom06991

團隊KnULPers_from_NCCU

2022-10-10 20:36:55

4034 瀏覽

分享至

連續看了那麼多天的模型訓練相關知識，我們今天來看看比較「語言學」會做的任務吧！

句法分析(syntactic parsing )是自然語言處理中一個重要的任務，其目標是分析句子的文法結構並將其表示為較為直觀的結構(通常是樹形結構，即所謂句法樹)。

為什麼是「句法樹」呢？因為其實語言是有階層關係的，固定數量的文法結構能夠生成無數句子（文法是有限的，而句子是無限的）。也就是說句法分析主要著重在「如何」產生一個符合文法的句子。如圖所示：

tree

我們透過這樣的句法分析就能明確的知道，要構成一個句子可能需包含名詞片語（NP）、動詞片語（VP）和可有可無的 preposition phrase (PP)。

圖片來源：(https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.nltk.org%2Fbook%2Fch08.html&psig=AOvVaw17NGJ9OhnD35_fxT-aDEOV&ust=1663863658633000&source=images&cd=vfe&ved=0CAwQjRxqFwoTCJixhqmlpvoCFQAAAAAdAAAAABAD
)

而不同句法分析所使用的句法樹，依存句法樹並不關注如何生成句子這種問題。依存句法是分析句子中的語法結構並找出相關單詞以及它們之間關係類型的過程。

每個句法關係需包含以下兩個條件：

有一個 head 和一個修飾 head 的 dependent 。
根據 head 和 dependent 之間的依賴性質進行標記。可以在 Universal Dependency Relation 中找到標記的規則和意思 (https://universaldependencies.org/u/dep/)。

例如以下這張圖片：

dependency

圖片來源：(https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.researchgate.net%2Ffigure%2FDependency-Parse-Tree-of-an-Example-English-Sentence_fig1_220816955&psig=AOvVaw1vRqBQXtHU-qH4ynmQMPI2&ust=1663864523196000&source=images&cd=vfe&ved=0CAwQjRxqFwoTCOiwvceopvoCFQAAAAAdAAAAABAN)

句子的 root 就是 hit 這個動詞，接著就是去找每個 token 和 root 的關聯。John 就是名詞主語（nominal subject, nsubj）、can 是助動詞（auxiliary）、ball 是直接受詞（direct object, dobj）。而 the 在這個句子中與 root 沒有關係，是跟直接受詞 ball 有關係，the 是 ball 的限定詞（determiner, det）。最後就是標點符號（punctuation, p）

了解完畢什麼是 dependency parsing 後我們就直接來用 Python 做做看吧！

Python Dependency Parsing 實作 -- 方法一


import stanza
stanza.download('en') # en 改成 zh 就可以 parse 中文了


nlp = stanza.Pipeline('en',
                     processors = 'tokenize,mwt,pos,lemma,depparse')


sentence = 'I want to know how to conduct the dependency parsing task.' # 要 parse 的句子
doc = nlp(sentence)
doc.sentences[0].print_dependencies() # 因為只有 1 個句子，取 list 第一個就好



sent_dict = doc.sentences[0].to_dict()


# 印出 token, relation and head
print ("{:<15} | {:<10} | {:<15} ".format('Token', 'Relation', 'Head'))
print ("-" * 50)
for word in sent_dict:
  print ("{:<15} | {:<10} | {:<15} "
         .format(str(word['text']),str(word['deprel']), str(sent_dict[word['head']-1]['text'] if word['head'] > 0 else 'ROOT')))

執行結果為：

Token | Relation | Head

Python Dependency Parsing 實作 -- 方法二

第二種方式我使用中文來做示範


!pip install -U pip setuptools wheel
!pip install -U spacy
!python -m spacy download zh_core_web_sm # zh_core_web_sm 是中文，en_core_web_sm 是英文


import spacy
from spacy import displacy
nlp = spacy.load("zh_core_web_sm")


sentence = '今天學會什麼是句法的依存關係了！'

# nlp 函數返回一個具有各個標記訊息、語言特徵和關係的列表

doc = nlp(sentence)

print ("{:<15} | {:<8} | {:<15} | {:<20}".format('Token','Relation','Head', 'Children'))
print ("-" * 70)

for token in doc:
  # Print the token, dependency nature, head and all dependents of the token
  print ("{:<15} | {:<8} | {:<15} | {:<20}"
         .format(str(token.text), str(token.dep_), str(token.head.text), str([child for child in token.children])))

執行結果為：

Token | Relation | Head | Children

今天 | nmod:tmod | 關係 | []
學會 | compound:nn | 什麼 | []
什麼 | nsubj | 關係 | [學會]
是 | cop | 關係 | []
句法 | nmod:assmod | 依存 | [的]
的 | case | 句法 | []
依存 | compound:nn | 關係 | [句法]
關係 | ROOT | 關係 | [今天, 什麼, 是, 依存, 了, ！]
了 | dep | 關係 | []
！ | punct | 關係 | []

接著把 dependency tree 畫出來


displacy.render(doc, style='dep', jupyter=True, options={'distance': 120})

執行結果為：

參考資料：(https://towardsdatascience.com/natural-language-processing-dependency-parsing-cf094bbbe3f7)

Day 24 看不見的手？隱馬可夫模型（Hidden Markov Model）介紹與實作

Day 26 循環神經網路 RNN 與長短期記憶 LSTM 介紹與實作

系列文

語言學與NLP 共 30 篇

RSS系列文訂閱系列文

16 人訂閱

完整目錄

熱門推薦

{{ item.channelVendor }} | {{ item.webinarstarted }} |

直播中

尚未有邦友留言

立即登入留言

參賽組數

902 組

團體組數

37 組

累計文章數

19855 篇

完賽人數

528 人

15th鐵人賽 16th鐵人賽 13th鐵人賽 14th鐵人賽 17th鐵人賽 12th鐵人賽 11th鐵人賽鐵人賽 2019鐵人賽 javascript 2018鐵人賽 python 2017鐵人賽 windows php c# linux windows server css react

IT邦幫忙

語言學與NLP系列 第 25 篇